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Self-organization, where spontaneous orderings occur under driven conditions, is one of the hallmarks of 
biological systems. We consider a statistical mechanical treatment of the biased distribution of such 
organized states, which become favored as a result of their catalytic activity under chemical driving forces. A 
generalization of the equilibrium canonical distribution describes the stationary state, which can be used to 
model shifts in conformational ensembles sampled by an enzyme in working conditions. The basic idea is 
applied to the process of biological information generation from random sequences of heteropolymers, 
where unfavorable Shannon entropy is overcome by the catalytic activities of selected genes. The ordering 
process is demonstrated with the genetic distance to a genotype with high catalytic activity as an order 
parameter. The resulting free energy can have multiple minima, corresponding to disordered and organized 
phases with first-order transitions between them. 

Despite enormous progress in understanding the characteristics of life's building blocks and their interac- 
tions 1 , many aspects of processes occurring in living organisms continue to pose challenges to physics- 
based explanations. A major difficulty is in characterizing their organization and function, which tend to 
appear spontaneously under suitable conditions, in stark contrast to common experience with nonliving matter 
ruled by increasing disorder. The term self-organization has been used widely to describe these spontaneous 
appearances of highly ordered structures, not only within biological systems but also in higher-level organizations 
including networks 2 6 , and complex dynamical systems exhibiting phenomena such as disease progression 7 9 and 
neural computation 10 . Different directions of theoretical approaches include wide-ranging studies of driven 
systems showing self-organized criticality 11,12 with implications to extinction dynamics 13 , dynamical systems 
views with analogies to equilibrium phase transition 14 , and concepts centered on autocatalysis, evolution, and 
selection 15 . Further studies in this interdisciplinary field include models of genetic regulatory networks and cell 
differentiation 16 , dynamic clustering in active media 17 , and the study of Boolean network dynamics 18 . However, 
one characteristic common to current approaches of studying self-organization is the lack of concrete connec- 
tions to equilibrium statistical mechanics. 

We focus in this paper on chemically driven systems and describe an approach extending the equilibrium 
statistical mechanical concepts to cover the stationary distribution of self-organized states. Our approach, which 
is based on a combination of equilibrium theory and enzyme kinetics, will allow us to distinguish self-organiza- 
tion from self-assembly, a related but distinct class of phenomena in which ordered structures are favored in 
equilibrium because of certain structural features present within the constituents. A familiar example is the 
formation of micelles and lipid bilayers stabilized by hydrophobic interactions 19 , for which well-established 
and quantitative theories now exist 20 . 

Self-organization, in contrast, is a sustainable nonequilibrium process in which a system spontaneously 
increases and maintains its degree of ordering as a result of interactions and exchanges of matter with its 
surrounding. Typically the system is in thermal and barometric equilibrium with its surrounding, but is driven 
by supply and extraction of chemical species; the 'food' and 'waste' molecules. The point of view adopted in this 
paper is that the ordering occurs when the system has the potential to catalyze reactions involving these externally 
controlled species in the favored direction. We use a simple description of this effect to derive a biased distribution 
of system configurations away from equilibrium. The driving force increasingly favors organized states that 
would have negligible probabilities of occupation in equilibrium. 

One immediate consequence of such effects amenable to current experimental investigations is the shift in 
conformational distributions of protein enzymes while under stationary working conditions compared to 
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equilibrium. Recent developments in single-molecule experimental 
techniques 21,22 have uncovered many surprises challenging the tra- 
ditional views on how enzymes operate, including the classical 
'induced fit' mechanism, which assumed that an enzyme mostly 
inhabits a single conformation, only changing it as a result of sub- 
strate binding. Increasing body of experimental evidence suggests 
that many enzymes instead sample a wide range of conformational 
states without ligands, while a substrate binding event shifts the 
equilibrium into states optimal for catalytic activities ('conforma- 
tional selection' mechanism) 23 25 . Theoretical studies so far have 
focused mostly on the effects this conformational heterogeneity 
has on kinetic velocity 25,26 . The theoretical consideration presented 
in this paper provides a simple statistical mechanical description of 
the nonequilibrium enzyme conformational distribution that can be 
used to interpret single-molecule experiments. 

As a second, more fundamental application, we address the issue 
of how modern proteins capable of exhibiting such ordering that 
defies entropic costs - e.g., via conformational changes leading to 
the activation of its enzymatic activity - or rather the biological 
information encoding their structure and function, could have arisen 
spontaneously. Viewed from the information theoretic perspective 27 , 
the generation of such biological information is a self-organization 
process where random heteropolymers with maximum entropy were 
replaced by highly conserved genes. Considerations of this process 
can serve as a bridge between equilibrium statistical mechanics and 
current well-developed statistical approaches to modeling evolu- 
tion 28 , as well as highly successful recent developments in data-driven 
characterizations of biological genotype-phenotype spaces 29,30 and 
reconstruction of the evolutionary history of extant metabolic 
pathways 31 . 

There are two broad classes of approaches put forward to describe 
such an ordering of biochemical sequences. One highly influential 
perspective is to center on the properties of the minimal networks of 
autocatalytic species 15,32-36 , where self-organization becomes possible 
when the degree of diversity of the autocatalytic set exceeds a criti- 
cal value. A different approach, represented by the quasispecies 
theory 37,38 and theoretical works based on it 39 45 , takes the self- 
replication population dynamics of sequences as a starting point. 
The order-disorder transition of genetic information occurs as the 
mutation rate of the replication process crosses a threshold value. 

Based on the general consideration of the stationary distribution 
under driven conditions, we consider in this paper the question of 
whether the ordering in sequence space can occur purely from chem- 
ical driving forces without the mechanism of competition based on 
self-replication postulated in quasispecies theory. It is shown that 
there exists a first-order transition from the phase characterized by 
random sequences to those dominated by the few active sequences 
irrespective of the specific population dynamics of heteropolymers. 
Our thermodynamic consideration thus connects the biochemical 
self-organization more directly to chemical driving forces and reveals 
its close correspondence to equilibrium phase transitions, comple- 
menting the autocatalytic kinetics-based and population-based 
approaches. 

Results 

Self-organization. We consider a system in thermal equilibrium 
with a reservoir (Fig. 1), characterized by a set of (coarse-grained) 
states n and corresponding (free) energy E„. The equilibrium cano- 
nical distribution of finding the system in state n is P^ eq ' aze~ E " (we 
measure energy and entropy in units of temperature and Boltzmann 
constant, respectively). An example of n is a variable indicating 
whether an enzyme is in one conformational state or another, de- 
fined for instance in terms of an angle or distance between certain 
subdomains within the protein. When driven by the reservoir, each 
state n has some degree of catalytic activity toward a reaction R — > P 
imposed by the reservoir: 




Reservoir 



Figure 1 | Self-organization under chemical driving force. A 

nonequilibrium driving force of reservoir biases the equilibrium inside the 
system, where ingredients for an enzyme switches between disordered (D) 
and organized (O) states. The enzyme becomes active only in the latter 
state, catalyzing the reaction from substrate R into product P. The reservoir 
supplies R in excess quantities versus P. 



R + S„ ^± S„'R ^± S„ + P, 



(1) 



where S„ is the system in state n, S„'R denotes a complex with a 
bound R, and the rate constants fcj,* 1 ' 2 ' vary depending on n. This 
scheme of enzyme catalysis under different conformational states 
corresponds to the conformational selection mechanism (as 
opposed to the induced fit), now supported by a growing body of 
experimental evidence 24,46 . The system also equilibrates between 
different states n and m: 



Km 



(2) 



where k mn is the rate of conversion from n to m. The rate equation for 
the probability pW of being in state n with a bound R is 

P W = Cr fc (1) P (0) - fc ( - 1] P (r) - k (2) P (r) + cji--*> P (0) , (3) 

n r n n n n n n 1 P n n 1 V / 

where is the probability for n without R, and c r , c p are the 
concentrations of R and P, respectively. At steady state, 

P { n r) =zA°\ (4) 



where 



r k {l) + r k { - 2) 

L r l\. n T Lpl\. n 

kn ^ + kn ^ 



(5) 



or if c p = 0, z„ = c r K„ = c r /K m where K n 1 =K m = ( k[ 



kn is the Michaelis constant. 

In the absence of driving forces from the reservoir, the set of states 
{«} satisfies the detailed balance condition, k mn P n 0 ' = k nm P^\ which 
remains valid even when P^ > 0, because Eq. (3) does not couple n 
and m. This assumption could be violated if for instance the reaction 
scheme Eq. ( 1 ) is generalized such that S„ 1 R could turn into states m 

n, which may yield useful models for molecular motors 47 ' 49 . We 
will not consider such cases in this paper. In contrast to Eq. (2), the 
two main reaction steps in Eq. (1) do not satisfy detailed balance 
except in equilibrium. 
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We have P%> / P<,°> = k mn /k nm =i>W /P^= e - E - + \ and may 
thus write 



P(°) = 



where Q is a normalization constant. From Eqs. (4) and (6), the total 
probability P„ to be in state n is 



Pn-P ( n 0) +P^ = 



(7) 



where 



£>£„-ln(l+z„) (8) 
is a generalized free energy. Since £„P„ = 1, we have 

Q=£(l + Z „)e-^=5>- p » ; (9) 

which is therefore a generalized partition function. The significance 
of partition function in equilibrium statistical mechanics carries over 
to this nonequilibrium extension: the expectation value of a state- 
dependent quantity q m 



(10) 



can be calculated from Q(f) = E„e~ £; +/<f " by (q) = 8 In Q/8f\ f=0 . 
From Eq. (1), the stationary velocity v can be written as 



-r i-(- 2 )p(°) 



(11) 



1 ck {1) k {2) -c k { - 1] k { - 2) 

which is a constitutive relation connecting the nonequilibrium flux v 
to thermodynamic forces: at steady state, the system entropy is con- 
stant, and the total entropy production rate is S where S = S(n r , n p ) is 
the entropy of the reservoir, a function of the number of species «, (i 
= R, P). Its rate of change due to the catalyzed reaction is 
S= — ji r h r — [iphp = Fv, where ^ are the chemical potentials 50 and 
F = ji r — j.ip since hp = —h r = v. With the concentrations c; = c] e 1 ' 1 
where c- is a constant, Eq. (11) gives a closed-form relation of the net 

flux V tO flj. 

A simplest special case is where there are only two states, n = D, O, 
each corresponding to disordered and organized states (Fig. 1), and 

AE = E 0 - E D > 0. From Eq. (8), if k^ =0 and c p = 0, the relative 
stability of D over O is reversed when c r >c* r = K m (e A£ — l) . Typical 
K m values of enzymes range from /.iM to mM ranges 1 . For an enzyme 
with K m = 1 /.(M, for instance, c* ~22 mM for A£ = 10. It is worth 
noting that in contrast to self-assembly, the stabilization of states 
with A£ > 0 arises strictly from the chemical driving force of the 
reservoir. If the matter flow is cut off, the system would quickly revert 
to D: the organized structure in the system 'dies'. 
For the two state model, Eq. (11) becomes 

k^C r -Cpk^k^/k^ 

V ~ (l + eM)K m + c r + Cpk(-2)/kM' (12) 

where the rate constants are those for state O. When c p = 0, Eq. (12) 
reduces to the Michaelis-Menten expression with the substrate con- 
centration for half-maximum velocity increased by a factor of 1 + 
due to the presence of the noncatalytic state D. Equilibrium is 
reached when c r /c p = k ( - 1) k ( - 2) /k a) k (2) and v = 0. 

More general cases can be illustrated further by a simple toy model 
of an enzyme with a continuous angular degree of freedom 0. This 
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Figure 2 | Organization induced under driven conditions. The model 
defined by Eq. (13) was used with g 0 = 5. (a) Angular distribution P(9). 
(b) Mean angle {0} in units of n as a function of the reduced reactant 
concentration £ = c r K. 

angle represents the degree of closing of the binding pocket at the 
center. As 0 becomes smaller, the catalytic activity increases linearly, 
while the thermal stability decreases: 

E(0)=go(l-e/n), K m l = K{l-0/n), (13) 

and Cp = 0 such that z = c r /K m . Figure 2 shows the changes in angular 
distribution and mean angle from the equilibrium to driven cases as 
the driving force ( = c r K increases. 

Information generation. Virtually all self- organized structures in 
biological systems are based on biopolymers, including proteins 
and nucleic acids (RNAs and DNAs), which carry biological 
information that are copied over generations with mutations. A 
satisfactory theory of self- organization therefore must address how 
such biological information could have been generated. We show 
below that the chemical driving force imposed by the reservoir 
induces an order- disorder transition in sequence space, where the 
stationary distribution of genotypes becomes dominated by sequen- 
ces with catalytic activity. 

We consider a chemically driven environment where nucleotide 
sequences of a fixed length I are continually synthesized and 
degraded, e.g., against a solid support. Each nucleotide at different 
sites on the chain can contain one of four possible bases, b = a, g, c, u, 
with the total number of all possible sequences s„ = {b t , fl = 4'. 
Without additional driving forces other than the chain synthesis, the 
resulting sequences would be mostly random. This pool of random 
sequences corresponds to the system in the disordered D state in 
Fig. 1. The stationary probability P„ for an RNA chain randomly 
picked from the population to have sequence s„ is P„ cce~ E ", where 
E„ describes the intrinsic relative stability of each sequence. 

If the reservoir exerts a driving force F = [i r — ji p > 0 for a reaction 
R P, Eq. ( 1 ) describes the catalysis with S„ referring to a chain with 
sequence n: RNA chains with suitable sequences can fold and 
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catalyze reactions, as does the catalytic core of modern ribosomes 
carrying out the synthesis of proteins during translation 51,52 . 
Equation (8) gives the biased energy E* n under the driven condition, 
where we assume E„ = 0 for simplicity. We note that z„ given by Eq. 
(5) in this case represents the enzymatic activity of the sequence n, or 
its fitness: the generalized free energy E* is more negative for 
sequences with higher fitness. Equation (7) therefore implies that 
as the system reaches the stationary state under the influence of 
reservoir driving force, the distribution would become peaked 
around sequences with low energy, or high fitness. The analogy of 
the evolution of populations toward genotypes with higher fitness - 
the climbing of peaks on the fitness landscape - to the minimization 
of energy in equilibrium systems has been observed before, notably 
by Kauffman 15 (but also note the proposal that fitness landscapes are 
intrinsically dynamic quantities 53 ). Here, we see this correspondence 
more directly from thermodynamic considerations. 

This identification of the fitness of genotypes as their catalytic 
activity contains its common definition in evolutionary theories - 
the replication rate - as a special case where R — » P is the self- 
replication reaction. The order-disorder transition we describe 
below, on the other hand, does not rely on the assumption of the 
existence of self-replication machinery, and may provide an under- 
standing of how the large variety of genes found in modern genomes 
coding for enzymes with many different functions could have 
originated. 

Emergence of a gene. To describe the biased population of sequences 
under driven conditions in more detail, the fitness (or energy) 
landscape £* needs to be specified. Physically, we only need to be 
convinced of the existence of sequences with high catalytic activities 
toward the reaction imposed by the reservoir. The fitness landscape 
describes the dependence of fitness values as we depart from these 
genotypes in sequence space. The theory of self-organization allows 
for a quantitative description of the dominance of highly fit 
genotypes with a discontinuous first-order transition. 

For concreteness, we adopt the class of landscapes widely 
studied in applications of the quasispecies theory 28,37,39,41 ' 42 ' 54 , where 
the fitness is given as a function of Hamming distance to the 
master sequence, which becomes the natural order parameter. It 
is worth emphasizing, however, that in contrast to the error cata- 
strophe transition in the quasispecies theory, the order-disorder 
transition we derive below is thermodynamic in origin independ- 
ent of specific population dynamics, and is applicable to any reac- 
tion for which a sufficiently strong enzymatic genotype exists in 
the sequence space. 

The distance h nm between two sequences n and m is the total 
number of nucleotide positions at which the base identities differ. 
For Z-mers, 0 £ h £ I. The probability Pj, of sequences with distance h 



p h =n h - 



-Gk 



Q 



where the free energy 



Gh=E* h — Sh 



(15) 



is given in terms of the entropy Sj, = In Q h and the number of 
genotypes Q h of distance h to the master sequence. This number 
can be written as 



(16) 



which gives the number of different ways of choosing h sites within I 
nucleotide positions, each site having 3 possible bases that differ from 
that of master sequence. The prefactor is the binomial coefficient, 
C{ = l\/h\(l-h)l 



In Fig. 3(a), a model fitness landscape 

K (h) = Ke-*/ 2e 



(17) 



where the fitness values are distributed by a Gaussian function cen- 
tered at the master sequence (h = 0) and c p = 0 such that Zh = c r K{h), 
was used to calculate the free energy profile as a function of h for 
three different values of the driving force £ = c r K. Under ( = 0, the 
minimum occurs near h = 3Z/4, which is the average distance of 
random sequences of length I from the master sequence, because 
random sequences have the maximum entropy. As £ increases, a 
new minimum develops near h = 0, whose location is determined 
by the balance of energy E* h and entropy Sh- The stationary distri- 
bution P h peaked at the minimum of Gh can be interpreted physically 
as follows: the probability of finding genotypes away from the master 
sequence is affected by the fitness cost (the energy increases with 
increasing h, reducing Ph) and the number of possible sequences 
(the entropy increases with increasing h, making Pf, larger). 

The qualitative features of G/, for different driving forces resembles 
the order-disorder transitions observed in equilibrium fluids, where 
a condensed phase can coexist with the disordered phase. In the 
corresponding phase diagram shown in Fig. 3(b) as a function of 
distance h and driving force (, the coexistence region between the 
disordered D phase and the organized O phase shrinks as the fitness 
peak width £ increases and ( decreases, vanishing at a critical point 
where the D-O transition becomes continuous. The D phase is char- 
acterized by random sequences, while in the O phase, the sequence 



a o 



O - 10 




Figure 3 | Stability of catalytic sequences under driven conditions. 

(a) Free energy profile G;, as a function of distance h to the master sequence 
given by Eq. (15) for / = 10. The fitness landscape (17) was used with c = 1. 

(b) Phase diagram showing the coexistence curve of the disordered and 
organized phases. The dashed line in (b) shows an 'isotherm' (order 
parameter h as a function of driving force Q for J = 1 with the arrows 
indicating the direction of increasing f. 



SCIENTIFIC REPORTS | 3 : 3329 | DOI: 10.1038/srep03329 



4 



distribution is peaked at a short distance from the catalytically active 
master sequence. The position of the minimum in Gp, in the O phase 
in Fig. 3(a) corresponds to the most likely distance value h expected 
within this population. 

The transition from the disordered to organized phases is accom- 
panied by the dominance of catalytically active states leading to a bias 
in distribution P m under which sequences carry information encod- 
ing enzymes. The amount of this information generated is quantified 
by the reduction in Gibbs entropy associated with P m known as the 
information content (per site) 27,55,56 , 

7 ^ s c eq) + 7E p " lnP « =ln4 -( lnQ+ ( £ 0)A ( 18 ) 

n 

where S' eq ' is the entropy of Pj™ , which is In 4 (or 2 bits) here with E„ 
= 0. The Gibbs entropy in Eq. (18) is that of the system, which 
remains constant over time in stationary states, while entropy is 
produced steadily in the reservoir. Figure 4 shows the growth of I c 
with increasing driving force ( for different values of fitness peak 
width C. The information content makes a jump as the D-to-0 trans- 
ition occurs, with the asymptotic value increasing with decreasing 
The information content of 2 bits per site is the upper limit reached 
when the final biased distribution is very sharply peaked. 

Multiple genes. Equation (1) can be easily generalized to cases where 
there are more than one externally imposed reactions, which would 
lead to the coexistence of multiple genes. We consider the case of two 
reactions (e.g., RNA elongation and another reaction such as ATP 
hydrolysis). The order parameter is a vector h = {hi, h 2 ) with two 
components specifying distances to the master sequences Sj and s 2 . It 
can then be shown that Eq. (8) becomes 



£* = -ln[l- 



-C 1 Ki{h l )+C2K 2 {h 2 )] 



(19) 



where Ci and c 2 are the concentrations of two reactants. We adopt 
Ki^{h) = k{K) with Eq. (17) and let Ci = CiK and £ 2 = C 2 K be the set of 
reduced driving forces. 

The calculation of fij, involves counting the number of sequences 
with given distances h to the master sequences. In Fig. 5, the two 
master sequences Sj and s 2 are depicted with their sites grouped into 
two sections, I and II, where nucleotides are different and identical 
between the sequences, respectively. The length of section I is h 12 = 
d. To count the number of sequences s„ with distances h 1 and h 2 to Si 
and s 2 , we start with Sj and first mutate a subset of length m from 
section II, such that h = (m, d + m). The number of ways of doing 
this is C l m d -3 m because there are three nucleotides different from 
each site in section II. We then mutate k sites from section I into 



the corresponding nucleotides of s 2 , which results in h = (m + k, d + 
m — k). The number of ways of doing this step is C\\ without any 
nucleotide multiplicity because the target sequence is fixed. By 
choosing k = d + m — h 2 , the distance h 2 to s 2 is achieved. 
Finally, we choose p additional sites from section I and mutate into 
nucleotides distinct from both Sj and s 2 , after which h = (m + k + p, 
d + m — k). The number of ways for this third step is -2^ 
because there are two nucleotides that can be chosen for each site. 
Taking p — hi~ m — k = hx + h 2 — d — 2m, we achieve the distance 
h Y to Sj. The total number of sequences is then given by 



£ c; 



1-d 



3 m -C 



C 



,h 2 - 



d+m — h 2 hi+h2—d — 2m ' 



-yh\ +h 2 — d — 2n 



(20) 



where the lower and upper limits of the summation can be deduced 
by requiring the binomial coefficients to be well-defined: 



mo = max{0, h\ —d, h 2 — d}, 



mi =min< hi, h 2 , l — d 



(21a) 



(21b) 



where \_x\ is the largest integer not exceeding x. 

Figure 6 shows the free energy Gh = — Sj, as a function of h for 
equilibrium (£ f = 0) and strongly driven (d = ( 2 = 10 4 ) cases. The 
landscape is bounded by values of h for which flh = 0. The allowed 
region can be deduced by requiring m 0 £ m t in Eqs. (21): 0 ^ hj £ I, 
hi^h 2 + d, h 2 s hi + d, and hi + fc 2 s d. The single minimum in 
Fig. 6(a) at h = (8,8) corresponds to the disordered D phase, which is 
the only phase in equilibrium. Under strong driving forces, in con- 
trast, up to two additional minima develop near the h 2 and hoaxes 
[Fig. 6(b)], which correspond to organized Oj and 0 2 phases domi- 
nated by sequences close to Sj and s 2 , respectively. The global min- 
imum switches from D to O; (or both when d = £ 2 ) with increasing 

The phase diagram in Fig. 7 shows such stability changes within 
the (,-space, where the three phases are separated by boundaries on 
which neighboring phases can coexist. There is a triple point where 
D, Oi and 0 2 phases can all coexist. As suggested by the single-gene 
case in Fig. 3(b), increasing the fitness peak width parameter c shifts 
the D-O, boundaries toward smaller (, values. The boundaries dis- 
appear at a critical point. The symmetry 1 <-> 2 in Figs. 6 and 7 is a 
result of our choice of the same fitness landscape, Eq. (17), for the two 
genes, and would be broken in more general cases. 
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Figure 4 | Information content versus driving force. The value given by 
Eq. (18) (divided by In 2 to convert I c into bits) is shown as a function of 
driving force f for three different landscape width (c) values. The 
maximum for nucleotide sequences is 2 bits per site. 
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Figure 5 | Calculation of entropy for two genes. The number of sequences 
Q{hi, h 2 ) is counted, where hi and h 2 are the distances to the two master 
sequences S! and s 2 . Sections I and II are sites where the two sequences are 
different and identical, respectively. New sequences s„ are generated by first 
choosing m sites from section II of s x and mutating them, k sites from 
section I into the corresponding sequences of s 2 , and p sites away from both 
Sj and s 2 , such that h={m + k + p,d+m — k). The number of sites A: and 
p are chosen such that h = {hi, h 2 ). Segments with the same filled patterns 
represent the same sequences. 
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Figure 6 | Free energy landscape for two genes. The free energy Q, as a function of h = (hi, h 2 ) is shown for / = 10, d = 5, and J = 1. (a) Equilibrium 
condition where Ci = d = 0- There is a single minimum at h = (8, 8), corresponding to the D phase, (b) Strongly driven case where d = £ 2 = 10 4 . A pair of 
minima at h = (2, 6), (6, 2), corresponding to the and 0 2 phases, becomes more stable than the D phase. 



Discussion 

In this paper, we presented a statistical mechanical description of 
biological self-organization under chemically driven conditions: the 
system possesses a small subset of coarse-grained states which can 
catalyze the reactions imposed externally by the reservoir. The entro- 
pic cost of observing these organized states is overcome when the 
driving forces are sufficiently strong, leading to biased stationary 
distributions dominated by such organized states, or phases. 

As a major application, we focused on the appearance of biological 
information encoded into sequences of nucleotide strands. Chemical 
driving forces lead to one or multiple peaks in sequence space, cen- 
tered on sequences that can catalyze the externally imposed reac- 
tions. In this viewpoint, genes encoding enzymes spontaneously 
appear and dominate the nucleotide sequence populations if the 
driving forces are sufficiently strong to overcome the Shannon 
entropy costs. Our analyses demonstrate that this transition into 
self-organization in sequence space has much in common with equi- 
librium phase transitions. Although the global stability of phases is 
dictated by the phase diagram (Fig. 7) as in equilibrium, metastable 
phases (and genes) can still be present together with the dominant 
phase outside the coexistence region, with the relative population of 
different sequences given by Eq. (7). 

The phase transition observed here can occur irrespective of how 
interconversion between sequences actually takes place (random 
synthesis on solid support or replication of existing chains). The 
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Figure 7 | Phase diagram for two genes. The parameter values are I = 10, d 
= 5, { = 1. Lines represent phase boundaries on which neighboring phases 
coexist. 



nature of sequence evolution, however, would affect the dynamics 
of ordering, which was not considered here. The establishment of 
stationary distribution, Eq. (7), requires sufficient exploration of all 
sequences via Eq. (2), fastest with random synthesis but still taking 
relaxation times that grow exponentially with /. High-fidelity repli- 
cations would slow down this relaxation, while allowing for the pre- 
servation of information already discovered. 

The organized phases in Figs. 6 and 7 consist of groups of hetero- 
polymers independently acting as enzymes. One of these groups can 
be polymerases catalyzing the synthesis of polymers. An aspect of 
evolutionary transitions we may presume to have occurred, in par- 
ticular, is that of the emergence of 'selfishness', or the ability of the 
polymerase gene to limit its action to its own replication only, exclud- 
ing others. This assumption is one of the starting points of the qua- 
sispecies theory 37 . The selfishness is also likely to be closely related to 
the conjoining of genes into genomes. It will be of interest to see how 
we may understand the evolution of selfishness within the statistical 
mechanical perspective. 

The viewpoint we adopted for biological self- organization - the 
stabilization of structures capable of catalyzing reactions imposed by 
chemical driving forces - may also have relevance to the question of 
how one may usefully define living organisms. Ruiz-Mirazo et al. 57 
emphasized two main elements in such a definition: autonomy and 
open-ended evolution. The former is a subset of self-organization 
capable of auto-regulation, while the latter requires the establishment 
of a division of labor between record-keeping (DNA) and expression 
(proteins) of biological information. The perspective adopted and 
elaborated in this paper shows how thermodynamic driving forces 
both constrain and enable self-organization, which may prove useful 
in understanding higher-level structures. 

The statistical mechanical expression for stationary states and its 
partition function, Eq. (9), can form a basis for calculating properties 
of systems with more complex features than assumed here. In par- 
ticular, one may have multiple reactions coupled to each other, a 
common situation in biochemical systems, which would lead to an 
extension of Eq. (19) to a coupled 'hamiltonian'. The calculation of 
partition function would be akin to that for interacting systems in 
equilibrium such as the Ising model. Such an extension would also 
allow one to consider fairly large systems in which the system com- 
ponents catalyze the formation of one another, forming an autoca- 
talytic network 15,32,33 . In such systems, the thermodynamic phase 
transitions studied here may therefore precede and combine with 
the transition to self-sustained autocatalytic organizations. 
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Methods 

For the toy model defined by Eq. (13), after replacing discrete sums with integrals and 
adding a field/, Eq. (9) reads 



Q(/)= [ d8(l + c r /K m )e- E{ " H f" 
Jo 



go/n+f 



• n{ -(l + Qe-^+l- 



) + 7t/ _ 



(22) 



and the mean angle (Fig. 2) can be calculated by 

im dlnQ \ 

= *[ gofe + O-CCl-e" 81 ) 
go \go [1 - (1 + f)e"*l + C C 1 - «~*) 



(23) 



In obtaining Fig. 3, Eq. (15) was used with Eq. (16) and 

Cj, =r(/+ l)/r(/2+ l)r(/ — fr+ l), where F(z) is the gamma function, such that 
nonintegral values of distances could be included. For the two-dimensional land- 
scapes in Fig. 6, h 1<2 were restricted to integers because of the summation in Eq. (20). 

The phase diagram in Fig. 3(b) was obtained by varying the width parameter £ of 
the landscape (17), and locating the values of the driving force £ for which the two 
phases - organized (O) and disordered (D) - have the same free energy. The coex- 
istence values of £ decreases with increasing £ from top to bottom. 

The validity of Eq. (20) for the total number of sequences with given distances to 
two master sequences was verified by enumerating all genotypes for small / and 
counting the number of sequences for each set of possible distance values. 
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